Skip to content

Conversation

@shayne-fletcher
Copy link
Contributor

Differential Revision: D87582955

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 20, 2025
@meta-codesync
Copy link

meta-codesync bot commented Nov 20, 2025

@shayne-fletcher has exported this pull request. If you are a Meta employee, you can view the originating Diff in D87582955.

shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Nov 21, 2025
Summary: Pull Request resolved: meta-pytorch#1960

Differential Revision: D87582955
shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Nov 21, 2025
Summary: Pull Request resolved: meta-pytorch#1960

Differential Revision: D87582955
shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Nov 21, 2025
Summary: Pull Request resolved: meta-pytorch#1960

Differential Revision: D87582955
@shayne-fletcher shayne-fletcher force-pushed the export-D87582955 branch 2 times, most recently from b467b85 to 69092fd Compare November 21, 2025 21:23
shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Nov 21, 2025
Summary:

this is research into using systemd to manage transient units and surface their output.

this change wires up an optional systemd feature in hyperactor_mesh and teaches BUCK to build with that feature. GitHub / OSS builds never enable the feature and so avoid depending on libsystemd being present on the runners.

in rust, src/systemd.rs gains more transient units tests: experiments with log aggregation.

the journald-based test is gated on `target_os = "linux"` and the `systemd` feature and then soft-fails if the journal or D-Bus session aren't available; this reflects the reality that GitHub CI and Meta devgpu/devvm journal configuration don't let us rely on that path.

in response, this diff explores a more robust approach where units write to file descriptors we pass in over D-Bus (via `UnixStream::pair` and `Fd`), and we aggregate logs from one or many units in-process using async readers.

this gives us a workable story for "systemd for unit management, Unix sockets for log transport" even where journald isn't practically usable.

Differential Revision: D87582955
Summary:

this is research into using systemd to manage transient units and surface their output.

this change wires up an optional systemd feature in hyperactor_mesh and teaches BUCK to build with that feature. GitHub / OSS builds never enable the feature and so avoid depending on libsystemd being present on the runners.

in rust, src/systemd.rs gains more transient units tests: experiments with log aggregation.

the journald-based test is gated on `target_os = "linux"` and the `systemd` feature and then soft-fails if the journal or D-Bus session aren't available; this reflects the reality that GitHub CI and Meta devgpu/devvm journal configuration don't let us rely on that path.

in response, this diff explores a more robust approach where units write to file descriptors we pass in over D-Bus (via `UnixStream::pair` and `Fd`), and we aggregate logs from one or many units in-process using async readers.

this gives us a workable story for "systemd for unit management, Unix sockets for log transport" even where journald isn't practically usable.

Differential Revision: D87582955
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant